MixupMapper: correcting sample mix-ups in genome-wide datasets increases power to detect small genetic effects

نویسندگان

  • Harm-Jan Westra
  • Ritsert C. Jansen
  • Rudolf S. N. Fehrmann
  • Gerard J. te Meerman
  • David van Heel
  • Cisca Wijmenga
  • Lude Franke
چکیده

MOTIVATION Sample mix-ups can arise during sample collection, handling, genotyping or data management. It is unclear how often sample mix-ups occur in genome-wide studies, as there currently are no post hoc methods that can identify these mix-ups in unrelated samples. We have therefore developed an algorithm (MixupMapper) that can both detect and correct sample mix-ups in genome-wide studies that study gene expression levels. RESULTS We applied MixupMapper to five publicly available human genetical genomics datasets. On average, 3% of all analyzed samples had been assigned incorrect expression phenotypes: in one of the datasets 23% of the samples had incorrect expression phenotypes. The consequences of sample mix-ups are substantial: when we corrected these sample mix-ups, we identified on average 15% more significant cis-expression quantitative trait loci (cis-eQTLs). In one dataset, we identified three times as many significant cis-eQTLs after correction. Furthermore, we show through simulations that sample mix-ups can lead to an underestimation of the explained heritability of complex traits in genome-wide association datasets. AVAILABILITY AND IMPLEMENTATION MixupMapper is freely available at http://www.genenetwork.nl/mixupmapper/

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification and Correction of Sample Mix-Ups in Expression Genetic Data: A Case Study

In a mouse intercross with more than 500 animals and genome-wide gene expression data on six tissues, we identified a high proportion (18%) of sample mix-ups in the genotype data. Local expression quantitative trait loci (eQTL; genetic loci influencing gene expression) with extremely large effect were used to form a classifier to predict an individual's eQTL genotype based on expression data al...

متن کامل

Applications of multiplex ligation-dependent probe amplification (MLPA) method in diagnosis of cancer and genetic disorders

Introduction: Lots of human diseases and syndromes result from partial or complete gene deletions and duplications or changes of certain specific chromosomal sequences. Many various methods are used to study the chromosomal aberrations including Comparative Genomic Hybridization (CGH), Fluorescent in Situ Hybridization (FISH), Southern blots, Multiplex Amplifiable Probe Hybridisation (MAP...

متن کامل

Whose sample is it anyway ? Widespread misannotation of samples in transcriptomics studies

Concern about the reproducibility and reliability of biomedical research has been rising. An understudied issue is the prevalence of sample mislabeling, one impact of which would be invalid comparisons. We studied this issue in a corpus of human transcriptomics studies by comparing the provided annotations of sex to the expression levels of sex-specific genes. We identified apparent mislabeled ...

متن کامل

Spatially Uniform ReliefF: Increasing the Power to Detect Epistasis in Genetic Association Studies

Background: Genome-wide association studies are becoming the de facto standard in the genetic analysis of common human diseases. Given the complexity and robustness of biological networks such diseases are unlikely to be the result of single points of failure but instead likely arise from the joint failure of two or more interacting components. The hope in genome-wide screens is that these poin...

متن کامل

Multimodal Transportation p-hub Location Routing Problem with Simultaneous Pick-ups and Deliveries

Centralizing and using proper transportation facilities cut down costs and traffic. Hub facilities concentrate on flows to cause economic advantage of scale and multimodal transportation helps use the advantage of another transporter. A distinctive feature of this paper is proposing a new mathematical formulation for a three-stage p-hub location routing problem with simultaneous pick-ups and de...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 27 15  شماره 

صفحات  -

تاریخ انتشار 2011